04:00
2026-06-16
arxiv.org
artificial-intelligence
X-Tokenizer: A Multimodal Action Tokenizer for Vision-Language-Action Pretraining
Researchers introduced X-Tokenizer, a multimodal action tokenizer for vision-language-action pretraining that uses Semantic Residual Quantization and Masked Action Modeling to create a discrete actionβ¦